Regular expressions

On this page I will try to explain the regular expression based search in Tiny Hexer.


top bottom

top bottom

Overview

You can choose to use Regular expressions in Tiny Hexer's Find/Replace window to perform powerful searches. If you ever tried the wildcards "*" or "?" in a directory listing (like "dir *.txt"), you already have used a simple form of regular expressions.

Regular expressions in Tiny Hexer, however, can be much more powerful than those filename wildcards.

Imagine the following: There's a binary file containing address records, but it's corrupted somehow and cannot be opened anymore in the address book application. Now you need the address of a Mr. Johnson, Jonson, Jenssen or is it Jensen? Unfortunately you are not sure about the exact name, and you cannot find his address anywhere else. But you know that the record is stored in this binary address record file. After loading this file into Tiny Hexer you could do sequential searches for all those names in the data, but what if he's called Jehnsen or Jonnsen? Using the regular expression J[oe]h?n+s+[oe]n you can search for all these names at once.

Well..., how does it work? I'll try to answer this question in the following section.


top bottom

Definitions

Regular expressions are sets of elements used to match patterns of data, that is each element (character, set or group) in a regular expression can match none, one or more characters or groups of characters (depending on branches and modifiers).

Characters

The simplest element is an ordinary character ("a", "b"...). This element matches exactly that character in the data. In the example above the character J is used to find the first character of the name (Johnson, Jensen...). Character elements can be written in hexadecimal form by being prefixed with the \x escape to allow searching for binary data (e.g. \x0A for the character with the binary value of 0A hex = 10 dec). Some characters that have a special meaning in regular expressions must be escaped using the backslash \ (unless they are written in hexadecimal notation), those characters are ., *, \, +, (, ), ?, |, [ and ].

Sets

Sets are used to match either one of the characters included in the set. In the example above, the set [oe] is used to match either the character "o" (Jon.../...son) or the character "e" (Jen.../...sen). Sets are embedded in brackets ([...]).You can include ranges of characters in sets by writing a hyphen between the first and the last value, e.g. to match alphanumerical characters, you may write [0-9a-zA-Z]. You can also use negated sets by putting the circumflex (^) after the opening bracket: If you want to search for any data except alphanumerical characters, you may write [^0-9a-zA-Z].

The following predefined sets exist (they do not need to be embedded in brackets):

Groups

Groups connect sequences of elements. They are used in conjunction with modifiers and branches (see below). Element groups are embedded in parentheses ((...)). So if you want to search for either "John" or "Jane", use the regular expression (John|Jane), if you want to search for either "John" or "Jonny", you might use Joh?n(ny)?.


top bottom

Branches

Branches allow to search for either one of alternatives of groups or elements. They work like logical ORs. All elements of a branch have to be separated by a pipe character (|). So if you want to search for either "John Smith", "John Taylor" or "John Q Public", you can use the regular expression John (Smith|Taylor|Q Public).

Modifiers

Modifiers tell the matcher how often the preceding group, element or set must occur in the data.

The following modfiers exist:

The modifier ? has a different meaning if it's the first character in the regular expression: It tells the matcher to be "non-greedy", that is the modifiers "eat" as little characters as possible to match the pattern. An example: The pattern T.+s finds "This is" in the text "This is a text", the pattern ?T.+s just finds "This".


top bottom

Notes


top bottom
mirkes.de's Tiny Hexer, Copyright ⌐ Markus Stephany. All rights reserved.